Fast High-Dimensional Data Search in Incomplete Databases
نویسندگان
چکیده
We propose and evaluate two indexing schemes for improving the efficiency of data retrieval in high-dimensional databases that are incomplete. These schemes are novel in that the search keys may contain missing attribute values. The first is a multi-dimensional index structure, called the Bitstring-augmented R-tree (BR-tree), whereas the second comprises a family of multiple one-dimensional one-attribute (MOSAIC) indexes. Our results show that both schemes can be superior over exhaustive search. Experimental results suggest that BRtrees have lower update and storage costs and are able to support range queries more efficiently under most circumstances, when compared to the MOSAIC indexing scheme. However, contrary to conventional wisdom, the MOSAIC structure outperforms the BR-tree in retrieval time for point queries, as well as in range queries over incomplete databases for dimension-unrestricted data distributions.
منابع مشابه
Fast Near Neighbor Search in High-Dimensional Binary Data
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data, which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating has...
متن کاملMLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions
High-dimensional indexing has been very popularly used for performing similarity search over various data types such as multimedia (audio/image/video) databases, document collections, time-series data, sensor data and scientific databases. Because of the curse of dimensionality, it is already known that well-known data structures like kd-tree, R-tree, and M-tree suffer in their performance over...
متن کاملRIVA: Indexing and Visualization of High-Dimensional Data Via Dimension Reorderings
We propose a new representation for high-dimensional data that can prove very effective for visualization, nearest neighbor (NN) and range searches. It has been unequivocally demonstrated that existing index structures cannot facilitate efficient search in high-dimensional spaces. We show that a transformation from points to sequences can potentially diminish the negative effects of the dimensi...
متن کاملFast Nearest-Neighbor Search Algorithms Based on High-Multidimensional Data
Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore pre-compute the result of any nearest-neighbor s...
متن کاملDetecting High-Dimensional Outliers: the New Task, Algorithms and Performance
Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998